Summarizing a Document Stream

نویسندگان

  • Hiroya Takamura
  • Hikaru Yokono
  • Manabu Okumura
چکیده

We introduce the task of summarizing a stream of short documents on microblogs such as Twitter. On microblogs, thousands of short documents on a certain topic such as sports matches or TV dramas are posted by users. Noticeable characteristics of microblog data are that documents are often very highly redundant and aligned on timeline. There can be thousands of documents on one event in the topic. Two very similar documents will refer to two distinct events when the documents are temporally distant. We examine the microblog data to gain more understanding of those characteristics, and propose a summarization model for a stream of short documents on timeline, along with an approximate fast algorithm for generating summary. We empirically show that our model generates a good summary on the datasets of microblog documents on sports matches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovery of Rare Sequential Topic Patterns in Document Stream

When and Where: Predicting Human Movements Based on Social Spatial-Temporal Events Ning Yang*, Sichuan University; Xiangnan Kong, University of Illinois at Chicago; Fengjiao Wang, University of Illinois at Chicago; Philip Yu, University of Active Multitask Learning Using Both Latent and Supervised Shared Topics Ayan Acharya*, University of Texas at Austin; Raymond Mooney, University of Texas at...

متن کامل

On Summarizing Graph Streams

Graph streams, which refer to the graph with edges being updated sequentially in a form of a stream, have wide applications such as cyber security, social networks and transportation networks. This paper studies the problem of summarizing graph streams. Specifically, given a graph stream G, directed or undirected, the objective is to summarize G as SG with much smaller (sublinear) space, linear...

متن کامل

Biogeography-Based Optimization Algorithm for Automatic Extractive Text Summarization

    Given the increasing number of documents, sites, online sources, and the users’ desire to quickly access information, automatic textual summarization has caught the attention of many researchers in this field. Researchers have presented different methods for text summarization as well as a useful summary of those texts including relevant document sentences. This study select...

متن کامل

Summarizing and Mining Inverse Distributions on Data Streams via Dynamic Inverse Sampling

Emerging data stream management systems approach the challenge of massive data distributions which arrive at high speeds while there is only small storage by summarizing and mining the distributions using samples or sketches. However, data distributions can be “viewed” in different ways. A data stream of integer values can be viewed either as the forward distribution f(x), ie., the number of oc...

متن کامل

Summarizing Noisy Documents

We investigate the problem of summarizing text documents that contain errors as a result of optical character recognition. Each stage in the process is tested, the error effects analyzed, and possible solutions suggested. Our experimental results show that current approaches, which are developed to deal with clean text, suffer significant degradation even with slight increases in the noise leve...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011